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DETAILED ACTION 

1 . This office action is in response to application 1 0/695,979 filed on October 29, 
2003. Claims 1-33 are pending in the application and have been examined. 

Information Disclosure Statement 

2. The Information Disclosure Statement filed on October 29, 2003 has been 
considered in this application. 

Claim Rejections • 35 USC § 102 

3. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the Invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
states. 

4. Claims 1-33 are rejected under 35 U.S.C. 102(b) as being anticipated by Henton 
(US Patent 5,860,064). 

5. Consider claim 1 , Henton teaches a method (figure 5), comprising: 
identifying text to convert to speech (select text, step 501 ); 

selecting a speech style sheet from a set of available speech style sheets, said 
speech style sheet defining desired speech characteristics (Choose vocal emotion for 
selected text; step 503); 
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marking said text to associate said text witli said selected speech style sheet 
(figures 2-4 show marking text with colors, size, and boldface in order to associate text 
with a speech style); and 

converting said text to speech having said desired speech characteristics by 
applying a low level markup generated by said speech style sheet (Look up synthesizer 
values for chosen emotion in emotion table [table 2], step 505. Apply speech 
synthesizer vocal emotion values to the chosen text, step 507.). 

6. Consider claim 2, Henton teaches a method according to claim 1 , further 
comprising; 

sending said text with said low level markup to an output device (Obtained vocal 
parameters will be outputted by the text to speech system; column 4, line 45. Values 
shown in Table 2 are input to the speech synthesizer. Column 10, line 42.). 

7. Consider claim 3, Henton teaches a method according to claim 1 , further 
comprising: 

identifying at least one low level markup (columns of Table 2); 

defining a voice style at least in part by associating said voice style with said at 
least one low level markup (Table 2 gives examples of the defined emotions of the 
preferred embodiment of the present invention with their associated vocal emotion 
values; column 9, line 56.); and 
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associating a speech style sheet with said voice style (Figure 1 , device contains 
a memory for holding said vocal emotions parameters associated with emotions, 
column 4, line 54. Applicant defines the speech style sheet as a database; page 1 1 , 
line 16. Therefore Henton teaches a style sheet.). 

8. Consider claim 4, Henton teaches a method according to claim 3, wherein said 
associating said speech style sheet with said voice style includes: 

creating said speech style sheet (As such, note that the particular values shown 
are easily modifiable, by the system implementer and/or the user, to thus allow for 
differences in cultural interpretations and user/listener perceptions; column 9, line 61 . If 
parameters are modifiable, one could easily create emotional styles.). 

9. Consider claim 5, Henton teaches a method according to claim 3, wherein said 
associating said speech style sheet with said voice style includes: 

editing said speech style sheet (As such, note that the particular values shown 
are easily modifiable, by the system implementer and/or the user, to thus allow for 
differences in cultural interpretations and user/listener perceptions; column 9, line 61.). 

10. Consider claim 6, Henton teaches a method according to claim 1 , wherein said 
low level markup defines at least one of a pitch, a prosody, a voice quality, a duration, a 
ti^emor, a timbre, a speed, an intonation, a timing, a volume, and a pronunciation rule 
(Table 2 gives examples of the defined emotions of the preferred embodiment of the 
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present invention with their associated vocal emotion values; column 9, line 56. Table 
2, shows pitch mean, range, volume, and speaking rate.). 

1 1 . Consider claim 7, Henton teaches a method according to claim 1 , further 
comprising: 

providing said speech style sheet to at least one of a text-to-speech developer 
and a text-to-speech device (As such, note that the particular values shown are easily 
modifiable, by the system implementer and/or the user, to thus allow for differences in 
cultural interpretations and user/listener perceptions; column 9, line 61 . Style sheets 
must be presented to a developer to be modified. Obtained vocal parameters will be 
outputted by the text to speech system; column 4, line 45. Values shown in Table 2 are 
input to the speech synthesizer, Column 10, line 42.). 

12. Consider claim 8, Henton teaches a method according to claim 1 , further 
comprising: 

compiling a library of speech style sheets. (Figure 1 , device contains a memory 
for holding said vocal emotions parameters associated with emotions, column 4, line 54. 
The vocal parameters associated with an emotion was inherently programmed into 
memory.) 

13. Consider claim 9, Henton teaches a method according to claim 1, further 
comprising: 
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identifying at least one low level markup (column 1 1 lines 28-35 show text 
marked up with low level parameters.); 

associating a speech style sheet with said at least one low level markup (Column 
1 1 lines 28-35 show text marked up with low level parameters that were a result of 
applying different vocal emotions Ifrom table 2] to different portions of text; column 1 1 , 
line!.). 

14. Consider claim 1 0, Henton teaches a method according to claim 1 , wherein said 
speech style sheet is selected from a menu of available speech style sheets (Figure 2 
shows at the top a menu of emotions.). 

1 5. Consider claim 1 1 , Henton teaches a method according to claim 1 , wherein said 
marking of said text includes annotating said text with an annotation such as 
underlining, bolding, italicizing, highlighting, color-coding, coding, adding a symbol, a 
mark, or a design (Figures 2-4 show marking up text using color coding, bolding, and 
font size changes for emotions; column 9, line 7.). 

16. Consider claim 12, Henton teaches a method according to claim 1, wherein said 
converting said text to speech includes: 

identifying said low level markup associated with said speech style sheet 
(Column 11 lines 28-35 show text marked up with low level parameters that were a 
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result of applying different vocal emotions [from table 2] to different portions of text; 
column 11, line 1.); and 

converting said marking of said text to said low level markup (Figures 2-4, text is 
marked using color codes to determine an emotion; described in detail column 7 line 60- 
column 9 line 1 1 . Figure 5, Look up synthesizer values for chosen emotion in emotion 
table [table 2], step 505. Apply speech synthesizer vocal emotion values to the chosen 
text, step 507. Final marked up text with emotion values shown in column 1 1 , line 28- 
35.). 

17. Consider claim 13, Henton teaches a method according to claim 1 , wherein said 
marking of said text further associates said text with a voice style associated with said 
speech style sheet (Figures 2-4, text is marked using color codes to determine an 
emotion; described in detail column 7 line 60-column 9 line 1 1 . Emotions and 
parameters are shown in table 2.). 



18. Consider claim 14, Henton teaches a method according to claim 13, wherein said 
voice style represents at least one of an age, an educational level, an emotion, a 
feeling, a physical trait, a personality trait, and a speech category (Henton teaches a 
method for automatic application of vocal emotion parameters, abstract.). 

19. Consider claim 15, Henton teaches a method according to claim 1 , wherein said 
low level markup allows a text-to-speech developer to convey a certain amount of 



Application/Control Number: 10/695,979 Page 8 

Art Unit: 2626 

information using less text. (Column 1 1 lines 28-35 show text marked up with low level 
parameters that were a result of applying different vocal emotions [from table 2] to 
different portions of text; column 1 1 , line 1 . These low level parameters convey 
information using text to the synthesizer.) 

20. Consider claim 16, Henton teaches a method according to claim 1 , wherein said 
selecting is performed by a text-tp-speech developer not having expertise in voice arts 
(What is needed, therefore, is an intuitive graphical interface for specification and 
modification of vocal emotion of synthetic speech; column 2, line 36. Further, the 
present invention provides for the automatic specification of prosodic controls which 
create vocal emotional affect in synthetic speech produced with a concatenative speech 
synthesizer, column 2, line 64.). 

21 . Consider claim 1 7, Henton teaches a speech style sheet (Figure 1 , device 
contains a memory for holding said vocal emotions parameters associated with 
emotions, column 4, line 54. Applicant defines the speech style sheet as a database; 
page 11, line 16. Therefore Henton teaches a style sheet.), comprising: 

at least one voice style associated with at least one voice-type, said at least one 
voice style relating a high level markup of said voice-type to a low level markup of said 
voice-type (Device contains a memory for holding said vocal emotions parameters 
associated with emotions, column 4, line 54. Associations are shown in table 2. Figures 
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2-4 show marking up text using color coding, bolding, and font size to associate 
emotions with text for emotions; column 9, line 7.). 

22. Consider claim 1 8, Henton teaches the speech style sheet according to claim 17, 
wherein said high level markup of said voice-type is a text markup (Figures 2-4 show 
marking up text using color coding, bolding, and font size changes for emotions; 
columns 7 line 61 - 9, line 1 1 .). 

23. Consider claim 1 9, Henton teaches the speech style sheet according to claim 17, 
wherein said high level markup includes at least one of an underlining, a bolding, an 
italicizing, a highlighting, a color-coding, an annotation, a coding, and an application of 
at least one of a symbol, a mark, and a design (Figures 2-4 show marking up text using 
color coding, bolding, and font size changes for emotions; columns 7 line 61 - 9, line 
11). 



24. Consider claim 20, Henton teaches the speech style sheet according to claim 17, 
wherein said low level markup of said voice-type includes code causing generation of 
speech having particular speech properties (Column 1 1 lines 28-35 show text marked 
up with low level parameters that were a result of applying different vocal emotions 
[from table 2] to different portions of text; column 1 1 , line 1 . Values shown in Table 2 
are input to the speech synthesizer, Column 10, line 42.). 
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25. Consider claim 21 , Henton teaclies tlie speecli style sheet according to claim *1 7, 
wherein said low level markup defines at least one of a pitch, a prosody, a voice quality, 
a duration, a tremor, a timbre, speed, an intonation, a timing, a volume, and a 
pronunciation rule (Table 2 gives examples of the defined emotions of the preferred 
embodiment of the present invention with their associated vocal emotion values; column 
9, line 56. Table 2, shows pitch mean, range, volume, and speaking rate.). 

26. Consider claim 22, Henton teaches the speech style sheet according to claim 17, 
wherein said at least one voice style represents style characteristics such as an age, an 
educational level, an emotion, a feeling, a physical trait, a personality trait, and a speech 
category (Henton teaches a method for automatic application of vocal emotion 
parameters, abstract.). 

27. Consider claim 23, Henton teaches the speech style sheet according to claim 17, 
wherein said speech style sheet is at least one of a programming object, a programming 
module, a computer program, or a computer file (Figure 1 , device contains a memory 
for holding said vocal emotions parameters associated with emotions, column 4, line 54. 
The parameters must be saved in a computer file or program object to be stored by 
memory.). 



28. 



Consider claim 24, Henton teaches an apparatus (figure 1 ), comprising: 
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a processor having access to at least one speech style sheet (CPU 1 1 , 
connected to memory 17. Memory holds vocal emotion parameters associated with 
emotions; column 4, line 54.), said at least one speech style sheet containing a 
definition of a voice style associated with a voice-type, and said definition relating a high 
level markup of said voice-type to a low level markup of said voice-type (Device 
contains a memory for holding said vocal emotions parameters associated with 
emotions, column 4, line 54. Associations are shown in table 2. Figures 2-4 show 
marking up text using color coding, holding, and font size to associate emotions with 
text for emotions; column 9, line 7.), wherein said processor is operative to convert said 
high level markup to said low level markup (Look up synthesizer values for chosen 
emotion in emotion table [table 2], step 505. Apply speech synthesizer vocal emotion 
values to the chosen text, step 507.); 

a user interface device for applying said at least one voice style to text 
associated with said voice-type, said user interface being in communication with said 
processor (Figure 1, a keyboard 13, or other textual input device such as a write-on 
tablet or touch screen, provides input to the CPU/memory unit 1 1 , as does input 
controller 15 which by way of example can be a mouse, a 2-D trackball, a joystick, etc.; 
column 5, line 22.); and 

an output device connected to said processor for converting said text with said 
low level markup to speech (figure 1, output 21. Values shown in Table 2 are input to 
the speech synthesizer. Column 10, line 42.). 
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29. Consider claim 25, Henton teaclies the apparatus of claim 24, wherein said 
processor includes at least one of a text-to-speech engine (The preferred manner in 
which this invention would be implemented is in the context of creating vocal emotions 
that may be associated with text that is to be read by a text-to-speech synthesizer; 
column 9, line 15.) and a text normalizer (a simple linear normalization is then 
performed in the preferred embodiment of the present invention in order to translate the 
graphical modifications to the resulting vocal emotion effect; column 9, line 38). 

30. Consider claim 26, Henton teaches the apparatus according to claim 24, wherein 
said low level markup defines at least one of a pitch, a prosody, a voice quality, a 
duration, a tremor, a timbre, a speed, an intonation, a timing, a volume, and a 
pronunciation rule (Table 2 gives examples of the defined emotions of the preferred 
embodiment of the present invention with their associated vocal emotion values; column 
9, line 56. Table 2, shows pitch mean, range, volume, and speaking rate.). 

31 . Consider claim 27, Henton teaches the apparatus according to claim 24, wherein 
said high level markup includes at least one of an underlining, a holding, an italicizing, a 
highlighting, a color-coding, an annotation, a coding, and an application of at least one 
of a symbol,, a mark, and a design (Figures 2-4 show marking up text using color 
coding, holding, and font size changes for emotions; columns 7 line 61 - 9, line 1 1 .). 
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32. Consider claim 28, Henton teaches the apparatus according to claim 24, wherein 
said voice style represents at least one of an age, an educational level, an emotion, a 
feeling, a physical trait, a personality trait, and a speech category (Henton teaches a 
method for automatic application of vocal emotion, parameters, abstract.). 

33. Consider claim 29, Henton teaches a system (Figure 1 ), comprising: 

a designer device for creating speech style sheets (As such, note that the 
particular values shown are easily modifiable, by the system implementer and/or the 
user, to thus allow for differences in cultural interpretations and user/listener 
perceptions; column 9, line 61. If parameters are modifiable, one could easily create 
emotional styles.); 

a speech style sheet at least partially created by said designer device , said 
speech style sheet defining a voice style (Figure 1 , device contains a memory for 
holding said vocal emotions parameters associated with emotions, column 4, line 54. 
Applicant defines the speech style sheet as a database; page 1 1 , line 16. Therefore 
Henton teaches a style sheet.); 

a text-to-speech device for receiving text associated with a voice-type (The 
preferred manner in which this invention would be implemented is in the context of 
creating vocal emotions that may be associated with text that is to be read by a text-to- 
speech synthesizer; column 9, line 15.), said text having a high level markup associated 
with said voice style (Figures 2-4 show marking up text using color coding, bolding, and 
font size changes for emotions; columns 7 line 61 - 9, line 11.), said text-to-speech 
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device having access to said speech style sheet (CPU 1 1 , connected to memory 17. 
Memory holds vocal emotion parameters associated with emotions; column 4, line 54.) 
and also having: 

a memory for storing computer executable code (figure 1 , memory 17); 

and 

a processor for executing the program code stored in memory (CPU 1 1 ), 
wherein the program code includes; 

code to determine, by accessing said speech style sheet, a low 
level markup associated with said high level markup (Figure 5, Look up 
synthesizer values for chosen emotion in emotion table [table 2], step 505. 
); and 

code to convert said high level markup of said text to said low level 
markup (Apply speech synthesizer vocal emotion values to the chosen 
text, step 507.); and 
an output device for producing expressive speech using said text with said low 
level markup, said output device in communication with said text-to-speech device 
(figure 1 , output 21 . Values shown in Table 2 are input to the speech synthesizer. 
Column 10, line 42.) 

34. Consider claim 30, Henton teaches the system according to claim 29, further 
comprising: 
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a developer device in communication with said text-to-speech device (Figure 1 , a 
keyboard 13, or other textual input device such as a write-on tablet or touch screen, 
provides input to the CPU/memory unit 1 1 , as does input controller 1 5 which by way of 
example can be a mouse, a 2-D trackball, a joystick, etc.; column 5, line 22.), said 
developer device for marking text and providing said text to said text-to-speech device 
(Figures 2-4 show marking up text using color coding, bolding, and font size changes for 
emotions; columns 7 line 61 - 9, line 11.). 

35. Consider claim 31 , Henton teaches the system according to claim 29, further 
comprising: 

a user interface device in communication with said text-to-speech device (Figure 
1 , a keyboard 13, or other textual input device such as a write-on tablet or touch screen, 
provides input to the CPU/memory unit 1 1 , as does input controller 1 5 which by way of 
example can be a mouse, a 2-D trackball, a joystick, etc.; column 5, line 22.), said user 
interface device for applying high level markup to text and providing said text to said 
text-to-speech device (Figures 2-4 show marking up text using color coding, bolding, 
and font size changes for emotions; columns 7 line 61 - 9, line 11.). 

36. Consider claim 32, Henton teaches an article of manufacture (figure 1), 
comprising: 

a computer usable medium having computer readable program code means 
embodied therein for producing expressive text-to-speech (External storage 17, which 
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can include fixed disk drives, floppy disk drives, memory cards, etc., is used for mass 
storage of programs and data; column 5, line 26. Method, figure 5.), comprising: 

computer readable program code means for identifying text to convert to 
speech (select text, step 501 ); 

computer readable program code means for selecting a speech style 
sheet from a set of available speech style sheets, said speech style sheet 
defining desired speech characteristics (Choose vocal emotion for selected text; 
step 503); 

computer readable program code means for marking said text to associate 
said text with said selected speech style sheet (figures 2-4 show marking text 
with colors, size, and boldface in order to associate text with a speech style); and 

computer readable program code means for converting said text to 
speech having said desired speech characteristics by applying a low level 
markup associated with said speech style sheet (Look up synthesizer values for 
chosen emotion in emotion table [table 2], step 505. Apply speech synthesizer 
vocal emotion values to the chosen text, step 507.). 

37. Consider claim 33, Henton teaches a system for producing expressive text-to- 
speech, (system figure 1, Method figure 5), comprising: 

means for identifying text to convert to speech (select text, step 501 ); 
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means for selecting a speech style sheet from a set of available speech style 
sheets, said speech style sheet defining desired speech characteristics (Choose vocal 
emotion for selected text; step 503); 

means for marking said text to associate said text with said selected speech style 
sheet (figures 2-4 show marking text with colors, size, and boldface in order to 
associate text with a speech style); and 

means for converting said text to speech having said desired speech 
characteristics by applying a low level markup associated with said speech style sheet 
(Look up synthesizer values for chosen emotion in emotion table [table 2], step 505. 
Apply speech synthesizer vocal emotion values to the chosen text, step 507.). 

Conclusion 

38. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure is included on the notice or references cited (PTO-892). 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Douglas C. Godbold whose telephone number is (571) 
270-1451 . The examiner can normally be reached on Monday-Thursday 7:00am- 
4:30pm Friday 7:00am-3:30pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on (571) 272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-91 99 ( IN USA OR CANADA) or 571 -272-1 000. 
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